Efficient p-value estimation in massively parallel testing problems

نویسندگان

  • Rafal Kustra
  • Xiaofei Shi
  • Duncan J. Murdoch
  • Celia M. T. Greenwood
  • Jagadish Rangrej
چکیده

We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case-control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-linear Time-series Prediction by Systematic Data Exploration on a Massively Parallel Computer

In this paper we describe the application of massively parallel processing (MPP) to the problem of time-series analysis. Time-series problems involve sequences of numbers (for example, the daily closing values of the stock market, EEG patterns of brainwave activity, or, as discussed in this paper, the temporal values from set of equations of motion). Often the problem of interest is the predict...

متن کامل

N-ary Speculative Computation of Simulated Annealing on the AP1000 Massively Parallel Multiprocessor

Simulated annealing is known to be an efficient method for combinatorial optimization problems. Its usage for realistic problem size, however, has been limited by the long execution time due to its sequential nature. This report presents a practical approach to synchronous simulated annealing for massively parallel distributed-memory multiprocessors. We use an n-ary speculative tree to execute ...

متن کامل

A fixed and flexible maintenance operations planning optimization in a parallel batch machines manufacturing system

Scheduling has become an attractive area for artificial intelligence researchers. On other hand, in today's real-world manufacturing systems, the importance of an efficient maintenance schedule program cannot be ignored because it plays an important role in the success of manufacturing facilities. A maintenance program may be considered as the heath care of manufacturing machines and equipments...

متن کامل

Neural Network Implementation in SAS

The estimation or training methods in the neural network literature are usually some simple form of gradient descent algorithm suitable for implementation in hardware using massively parallel computations. For ordinary computers that are not massively parallel, optimization algorithms such as those in several SAS procedures are usually far more efficient. This talk shows how to fit neural netwo...

متن کامل

Estimation of Software Reliability by Sequential Testing with Simulated Annealing of Mean Field Approximation

Various problems of combinatorial optimization and permutation can be solved with neural network optimization. The problem of estimating the software reliability can be solved with the optimization of failed components to its minimum value. Various solutions of the problem of estimating the software reliability have been given. These solutions are exact and heuristic, but all the exact approach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biostatistics (Oxford, England)

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2008